Skip to content

[draft] new reader for curry files, using curry-python-reader code #13176

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 32 commits into
base: main
Choose a base branch
from

Conversation

dominikwelke
Copy link
Contributor

@dominikwelke dominikwelke commented Mar 26, 2025

hi all
as discussed in #13033 here a draft PR to use the official curry reader code.

in a first step i just use the reader as a module (only fixed formatting to force it past pre-commit).
it has some drawbacks (e.g. data is always loaded) and i did not implement all possible data yet (eg hpi, or epoched recordings) but in general it already works pretty well. tested it with all their example data, and one of my own recordings that didnt work in mne before.

it would be great to get some feedback how you want me to proceed with this @drammock @larsoner:

  • do we want to stick with the module approach, leave their code untouched and work with the output (would allow easier updating when they push changes)
  • or should i merge the code more thoroughly. making it easier to maintain and in terms of clarity

BACKGROUND:
the curry data reader currently cant handle all/newer curry files
plan is port code from the official curry reader into mne-python

for permission see #12855

closes #12795 #13033 #12855

@@ -0,0 +1,633 @@
# Authors: The MNE-Python contributors.
# License: BSD-3-Clause
# Copyright the MNE-Python contributors.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this file the official reader? is this file copied from somewhere?

Copy link
Contributor Author

@dominikwelke dominikwelke Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yes, it's copied from https://github.com/neuroscan/curry-python-reader

the false info you flagged was added by [autofix.ci]
further formatting changes were necessary to pacify pre-commit hook

Copy link
Contributor Author

@dominikwelke dominikwelke Mar 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you discussed this topic with a compumedics dev in #12855

they said they wont supply a pypi or conda version, but we are free to use the code.
the github repo has a BSD3 license applied, but they dont include any note in the file itself

@drammock
Copy link
Member

do we want to stick with the module approach, leave their code untouched and work with the output (would allow easier updating when they push changes), or should i merge the code more thoroughly. making it easier to maintain and in terms of clarity

Given that @CurryKaiser refused our offer to help them package up their reader for PyPI / conda-forge, I see two remaining options:

  1. "vendor" their code. To make it slightly future-proof, we could write a script (in tools/ I guess) that fetches the latest code from their repo, auto-formats it to make it compatible with MNE's pre-commit requirements, and puts the (formatted but otherwise unmodified) code in mne/io/curry/_vendored.py. (This is basically a manual version of git submodule update because I don't think we should invoke git submodule for this use case.) We then adapt our code in mne/io/curry.py to be a wrapper around their code that basically just gets things into proper MNE container objects; and know that we might need to tweak our wrappers any time the vendored code is updated.

  2. Fully incorporate their code. Re-write their reader to align better with our codebase, in terms of variable names, idioms like _check_option or _validate_type, features like preload=False, etc.

Personally I lean toward option 2. I say this because if we're going to try to support curry files, at a minimum we need to be able to fix bugs when they arise, and ideally we should be willing/able to incorporate new features that have high demand from our users (preload=False is an obvious first example). But if we're fixing bugs, do we open PRs to upstream (with no guarantee of responsiveness), or tweak our "thin" wrapper to handle more and more corner cases? Neither option is appealing, so at that point it starts to seem easier to me to just maintain the entire reader ourselves.

@agramfort
Copy link
Member

agramfort commented Mar 28, 2025 via email

@drammock
Copy link
Member

hum... my first reaction is to push a version 0.1 of their package on pypi and rely on this. Basically we maintain a fork and hope that the fork changes are accepted upstream... it feels less hacky and they also have a CI and some testing setup with test_data that I would not duplicate in mne-python...

that indeed is less hacky than my approach to vendoring. I'd be OK with that outcome, though curious what @larsoner will think.

@larsoner
Copy link
Member

I'm fine with that idea but it would be good to get some blessing/permission from them to do this

@drammock
Copy link
Member

@CurryKaiser

I'm fine with that idea but it would be good to get some blessing/permission from them to do this

xref to #12855 (comment) where I've asked for confirmation that Compumedics really doesn't want to be the packager and they're OK with us doing it.

@CurryKaiser
Copy link

@CurryKaiser

I'm fine with that idea but it would be good to get some blessing/permission from them to do this

xref to #12855 (comment) where I've asked for confirmation that Compumedics really doesn't want to be the packager and they're OK with us doing it.

And nothing has changed, so all good from our side. Sorry we couldn't package it for you. And thank you for working on this!

@dominikwelke
Copy link
Contributor Author

thanks @CurryKaiser !

ok, sounds like a plan.. I can start working on this again soon, if you give a go @agramfort @drammock @larsoner

I guess the fork should live in the mne-tools org? I have the necessary rights to create it

@dominikwelke dominikwelke changed the title [draft] new reader for curry files, using curry-pyhon-reader code [draft] new reader for curry files, using curry-python-reader code Apr 1, 2025
@larsoner
Copy link
Member

larsoner commented Apr 1, 2025

Yeah I think so

@drammock
Copy link
Member

drammock commented Apr 1, 2025

Yeah I think so

I already made the fork

@agramfort
Copy link
Member

agramfort commented Apr 1, 2025 via email

@drammock
Copy link
Member

drammock commented Apr 2, 2025

xref to mne-tools/curry-python-reader#1

@dominikwelke
Copy link
Contributor Author

i could need some guidance on 2 things:

  1. channel locations:
    curry files come with channel locations, and for EEG it was straight forward to build a montage and apply.
    but for MEG it seems i need to use other functions. any pointers would help!
    do i need to populate info["dig"] directly?

  2. HPI/CHPI data:
    some MEG files seem to come with these data. how do i store them in the raw object?

a few other things to discuss:

  • preload
    easiest would be to not offer preload=False and just load the data to memory.
    a single load_data() call would also be doable with the official reader, but a chunk reader not really (if we dont want to hack it; e.g. load all data and discard large parts). not sure i'm deep enough in the mne codebase to know what the implications are (e.g. computations, plots etc with unloaded data)
    what are your thought?

  • epoched files
    the reader code looks as if there could be files with epoched recordings, but there are none among their sample files. do any of you know more about this? otherwise ill ask the curry devs

@dominikwelke
Copy link
Contributor Author

p.s. and could you remind me how to switch off the CIs when pushing these early commits?

@larsoner
Copy link
Member

larsoner commented Apr 3, 2025

Push commits with [ci skip] in the commit message and they long / expensive CIs shouldn't run (a few quick ones still will I think)

@larsoner
Copy link
Member

larsoner commented Apr 3, 2025

... for the cHPI stuff it's probably easiest to load a Neuromag raw FIF file with cHPI info and look at how the info is stored for example in info["hpi_subsystem"]. You can also look at the Info docs, especially the notes. It's not complete but it will help.

For preload, since preload=False is in main it would be a functionality regression to remove it. Once you know how the data are stored on disk and how to read an arbitrary time slice from it, it's really not bad to do the work to make preload=False work. So if you can figure this part out in some function, I can help you fit it into the _read_segment_file code. Since the .py file is only a few hundred lines (a lot of which seems like plotting etc. that we won't use), I'm cautiously optimistic we can figure it out and make it work. And then the python-curry-reader code can really be for reading metadata, annotations, sensor locs, etc. plus knowing where to read the data from disk. We can probably even keep the existing _read_segment_file, it should work in theory...

@dominikwelke
Copy link
Contributor Author

dominikwelke commented Apr 14, 2025

ok, _read_segment_file does indeed work unchanged.
the reader should now be more/less functional

  • I'd still need some guidance on handling/storing the channel locations, esp. for MEG data
  • HPI data - looks like I got it wrong - there might not be cHPI data after all, only HPI marker locations provided in different formats depending on the system

@dominikwelke
Copy link
Contributor Author

@CurryKaiser
thanks for the permission to use the code, also from my side!

in another place you said you might be able to provide us with test files - could we perhaps get a small one with epoched recordings in it (format version shouldn't matter)?
your repository for the python reader contains some test files that the reader interprets as epoched, but they dont seem to really be (perhaps the files were truncated for size)

@CurryKaiser
Copy link

Could be that they were truncated, let me check.

@CurryKaiser
Copy link

Ok, try these:
EpochedData

@dominikwelke
Copy link
Contributor Author

thanks for the file @CurryKaiser
fyi, we have now packaged and published the curryreader on PyPI.
it can be installed via pip install curryreader

@dominikwelke
Copy link
Contributor Author

@drammock @larsoner @agramfort
it is on PyPI but not on conda-forge - how is this case dealt with in MNE? should we also submit it to conda forge?

currently pip install mne[full] fetches it, but conda env create --file environment.yml doesnt

related question:
which pip dependency level in pyproject.toml should this go to? i treated curryreader like the antio package (for ANT neuro files) but this makes it an optional requirement (in mne[full]). i believe this mean it wont be automatically installed when calling pip install mne?

@larsoner
Copy link
Member

it is on PyPI but not on conda-forge - how is this case dealt with in MNE? should we also submit it to conda forge?

Yeah use grayskull, it's not too painful, see for example conda-forge/staged-recipes#28279

@dominikwelke
Copy link
Contributor Author

Yeah use grayskull,

see conda-forge PR: conda-forge/staged-recipes#29754

@dominikwelke
Copy link
Contributor Author

I had some deadlines and will now be on vacation for a while. will continue working on the PR afterward

@drammock
Copy link
Member

users are asking for this; xref to forum post: https://mne.discourse.group/t/mne-io-read-raw-curry-for-curry-9/11245/2

@dominikwelke
Copy link
Contributor Author

users are asking for this; xref to forum post: https://mne.discourse.group/t/mne-io-read-raw-curry-for-curry-9/11245/2

good to hear that. just came back from annual leave and can start working on it again

@dominikwelke
Copy link
Contributor Author

hi @larsoner @drammock -
this is now doing what it should and it would be great to get some more in-depth revision!
thanks in advance for your feedback :)

specific questions:

  • sensor digitization:
    could someone with a better understanding of how MNE stores these have a look?
    curry files can store eeg locations and meg locations, all as metric xyz coordinates (i think). especially in case of meg i'm not 100% confident if i set all FIFF flags correctly, or if there are transformation steps i missed.
    what i implemented is partly copying the previous mne reader version, and a 3d plot of sensor locations (eeg+meg) looks ok.

  • tests:
    feel free to suggest improvements, or extensions to better cover my code.

  • API 1:
    curry files can store continuous or epoched data. for epoched data, i default to return an instance of Epochs but offer option to return a Raw with annotations (import_epochs_as_annotations=False).
    good like this, or should i change the default / rename the argument?
    (ps. maybe it can also store evoked data - the curryreader returns number of averages per epoch which suggest this might be the case, but i dont have example data for this, so i currently raise NotImplementedError. @CurryKaiser: can you tell us more about this? perhaps provide a small example file with evoked/averaged recordings?)

  • API 2:
    apart from reader read_raw_curry i also added defs to read eeg montage (read_dig_curry) and impedance measurements (read_impedances_curry). do i need to add them somewhere to the docs?
    the latter could also be included in https://mne.tools/stable/auto_examples/io/read_impedances.html, i guess?

  • potential privacy issue:
    curry files can include amplifier/device information. i currently just dump this to device_info.type.
    in some example files i saw that this info can be very long and include serial numbers and paths to local files.
    as i didnt find any spec for the curry format its not easy to split the string meaningfully, to store this identifiable stuff in the device_info items that are overwritten by MNEs anonymization functions.
    how should i go with this?
    @CurryKaiser: is there a fix structure to the information stored in the AmplifierInfo field that i could use?
    example:

Synamps MEG Headbox 1: Digital(PN:8509 SN:0029) Analog(PN:9227 SN:0020) FW:01 -- Headbox 2: Digital(PN:8509 SN:0063) Analog(PN:9227 SN:0059) FW:01 -- Headbox 3: Digital(PN:8509 SN:0026) Analog(PN:9227 SN:0045) FW:01 -- Headbox 4: Digital(PN:8509 SN:0019) Analog(PN:8548 SN:0012) FW:01 -- Headbox 5: Digital(PN:8509 SN:0078) Analog(PN:8548 SN:0070) FW:01 (SynAmps MEG), LP-Filter: order 2 Butterworth (Configuration: 186 MEG + Quik-Cap Net 128 01_23_2020) || MEG: H1 conf: E:\Curry_Configs\BNI_A_parameters20200220.cfg (Thu Feb 20 17:21:42 2020¶) | H1 sens: ./OrionMEG/MEG_186_system_CURRY_10092019.cfg (Wed Sep 11 02:57:42 2019¶) | H1 geom: ./OrionMEG/MEG_186_10092019 (Mon Sep 30 06:35:12 2019¶)

@dominikwelke
Copy link
Contributor Author

dominikwelke commented Jul 21, 2025

btw, the style issue says: mne/decoding/base.py:272: unused attribute 'required' (60% confidence)
dont seem to be me?

@larsoner
Copy link
Member

sensor digitization:

Yeah I can look a bit deeper... before I do, what's the situation on main? Are sensor positions loaded when you read_raw_curry?

The short version in case you want to think about if things are done correctly:

  1. EEG-like sensors are stored in head coord frame, i.e., the right-handed RAS coordinate frame formed by LPA, RPA, and Nasion.
  2. MEG-like sensor positions are stored in the MEG coord frame, which has a device-dependent origin (and up/orientation) but is otherwise RAS.
  3. info["dev_head_t"] stores the coordinate transformation between the head and MEG coordinate frame. This should be non-None for recordings with humans.

Is this how things work in this PR (and on main)?

[read_impedances_curry] could also be included in https://mne.tools/stable/auto_examples/io/read_impedances.html, i guess?

Sure!

store this identifiable stuff in the device_info items that are overwritten by MNEs anonymization functions.

Seems reasonable to me

curry files can store continuous or epoched data. for epoched data, i default to return an instance of Epochs but offer option to return a Raw with annotations (import_epochs_as_annotations=False).

Why have an option to import as raw? To me this seems like a more general problem that we might want to have an Epochs method epochs.as_raw() that does the wrapping back to a 2D array/structure with annotations or whatever. Can we live without this for now and see if people end up wanting it?

@larsoner
Copy link
Member

btw, the style issue says: mne/decoding/base.py:272: unused attribute 'required' (60% confidence)

Only tangentially related... vulture uses hueristics and I think by you removing the required from some function in the curry code, it now sees code in mne/decoding.py with a .required as being unused anywhere 🤷 To work around this just you should be able to add _.required to the vulture_allowlist.py and be good

@dominikwelke
Copy link
Contributor Author

curry files can store continuous or epoched data. for epoched data, i default to return an instance of Epochs but offer option to return a Raw with annotations (import_epochs_as_annotations=False).

Why have an option to import as raw? To me this seems like a more general problem that we might want to have an Epochs method epochs.as_raw() that does the wrapping back to a 2D array/structure with annotations or whatever. Can we live without this for now and see if people end up wanting it?

i can remove the option, no prob.

@dominikwelke
Copy link
Contributor Author

dominikwelke commented Jul 21, 2025

store this identifiable stuff in the device_info items that are overwritten by MNEs anonymization functions.

Seems reasonable to me

the point was: its not straightforward to parse and split the string ;)
fwiw there could be anything n there

a few options are:

  1. leaving as if, leading to cases with incomplete anonymization
  2. acting as if there were structure, and split the strings based on the limited examples i saw
  3. ignoring this info and leaving device_info empty
  4. dumping everything in one of the fields of device info that's overwritten instead of device_info.type

@dominikwelke
Copy link
Contributor Author

sensor digitization:

Yeah I can look a bit deeper... before I do, what's the situation on main? Are sensor positions loaded when you read_raw_curry?

yes sensor locations are also loaded in the legacy version. as i said, i reuse parts of this code.

in the PR i do set the coordinate frames as intended (head for eeg, device for meg) and for eeg im generally quite confident everything is fine (including RPA, LPA and Nasion).
for meg not so much - i did set a head-dev transform but i want feedback. thanks for having a look!

@dominikwelke
Copy link
Contributor Author

@larsoner -
vulture is indeed happy now.
new issue in check_neuromag2ft arose

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

MEG data problem consulting
5 participants